feat(compute): add EC2 fleet compute strategy by MichaelWalker-git · Pull Request #32 · aws-samples/sample-autonomous-cloud-coding-agents

MichaelWalker-git · 2026-04-14T20:04:04Z

Summary

Add EC2 fleet compute strategy with SSM Run Command dispatch — a third compute backend alongside AgentCore (default) and ECS Fargate
New Ec2ComputeStrategy handler: finds idle instances via tags, uploads payload to S3, dispatches via SSM AWS-RunShellScript, polls GetCommandInvocation, cancels with CancelCommand
New Ec2AgentFleet CDK construct: Auto Scaling Group with launch template (AL2023 ARM64), security group (443 egress only), S3 payload bucket, IAM role with scoped permissions, Docker user data for pre-pulling images
Wire orchestrator polling, cancel-task SSM dispatch, and task-api SSM permissions for EC2
Stack wiring is commented-out (same pattern as ECS) — ready to enable per-repo via blueprint compute_type: 'ec2'
Add instance_type field to RepoConfig and BlueprintConfig for future GPU/custom instance type support

Test plan

mise //cdk:compile — no TypeScript errors
mise //cdk:test — 43 suites, 697 tests all passing (including new ec2-strategy and ec2-agent-fleet tests)
mise //cdk:synth — synthesizes without errors (EC2 block commented out)
mise //cdk:build — full build including lint passes
Deploy with EC2 block uncommented and run an end-to-end task with compute_type: 'ec2'

Add a third compute backend (EC2 fleet with SSM Run Command) alongside the existing AgentCore and ECS strategies. This provides maximum flexibility with no image size limits, configurable instance types (including GPU), and full control over the compute environment. New files: - ec2-strategy.ts: ComputeStrategy implementation using EC2 tags for instance tracking and SSM RunShellScript for task dispatch - ec2-agent-fleet.ts: CDK construct with ASG, launch template, security group, S3 payload bucket, and IAM role - ec2-strategy.test.ts and ec2-agent-fleet.test.ts: full test coverage Wiring: - repo-config.ts: add 'ec2' to ComputeType, add instance_type field - compute-strategy.ts: add EC2 SessionHandle variant and resolver case - task-orchestrator.ts: add ec2Config prop with env vars and IAM grants - orchestrate-task.ts: enable compute polling for EC2 - cancel-task.ts: add SSM CancelCommand for EC2 tasks - task-api.ts: add ssm:CancelCommand permission for cancel Lambda - agent.ts: add commented-out EC2 fleet block (same pattern as ECS)

Copilot

Pull request overview

This PR adds a third compute backend (“ec2” fleet) alongside AgentCore and ECS, enabling task dispatch via SSM Run Command to tagged EC2 instances and adding the supporting CDK construct and runtime wiring.

Changes:

Introduces Ec2ComputeStrategy (S3 payload upload + EC2 instance selection/tagging + SSM dispatch/poll/cancel).
Adds Ec2AgentFleet CDK construct (ASG + instance role/SG + payload bucket + user data) and optional stack wiring.
Extends config/types and handlers to support compute_type: 'ec2' and EC2 cancellation/polling.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
yarn.lock	Locks new AWS SDK client dependencies used by EC2/SSM/S3 integration.
cdk/package.json	Adds `@aws-sdk/client-ec2`, `client-ssm`, `client-s3` dependencies.
cdk/test/handlers/shared/strategies/ec2-strategy.test.ts	Unit tests for EC2 strategy start/poll/stop behavior.
cdk/test/handlers/shared/compute-strategy.test.ts	Ensures `compute_type: ec2` resolves to `Ec2ComputeStrategy`.
cdk/test/constructs/ec2-agent-fleet.test.ts	Asserts key resources/permissions created by `Ec2AgentFleet`.
cdk/src/stacks/agent.ts	Adds commented wiring to enable EC2 fleet backend.
cdk/src/handlers/shared/strategies/ec2-strategy.ts	Implements EC2 fleet compute strategy via S3 + EC2 tags + SSM.
cdk/src/handlers/shared/repo-config.ts	Extends `ComputeType` to include `ec2`; adds `instance_type`.
cdk/src/handlers/shared/orchestrator.ts	Propagates `instance_type` into `BlueprintConfig`.
cdk/src/handlers/shared/compute-strategy.ts	Adds EC2 handle type + resolver case.
cdk/src/handlers/orchestrate-task.ts	Stores EC2 compute metadata and enables compute polling for EC2.
cdk/src/handlers/cancel-task.ts	Adds SSM CancelCommand support for EC2-backed tasks.
cdk/src/constructs/task-orchestrator.ts	Adds EC2 env vars and IAM policies for orchestrator Lambda.
cdk/src/constructs/task-api.ts	Adds `ssm:CancelCommand` permission option for cancel-task Lambda.
cdk/src/constructs/ec2-agent-fleet.ts	New ASG-based fleet construct with IAM/SG/S3/log group/user data.
cdk/src/constructs/blueprint.ts	Allows blueprint `compute.type` to be `ec2`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Remove unnecessary iam:PassRole from orchestrator (EC2 strategy never passes a role to any API) - Simplify ec2FleetConfig in task-api to empty object (instanceRoleArn was unused) - Use CDK Tags.of() for ASG fleet tag propagation instead of no-op user-data tagging — instances are now tagged at launch - Fix missing AWS_REGION in boot script by deriving from IMDS - Eliminate shell injection risk by reading all task data from S3 payload at runtime instead of interpolating into bash exports - Add cleanup trap in boot script to always retag instance as idle on exit (success, error, or signal) - Add try/catch rollback in startSession to retag instance as idle when SSM dispatch fails - Generalize ECS-specific log messages in poll loop to be compute-backend-agnostic (uses strategy type label)

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix malformed sed quoting in AWS_REGION derivation (ec2-strategy.ts) - Remove unused blueprintConfig destructuring (ec2-strategy.ts) - Scope EC2/SSM IAM permissions: condition ec2:CreateTags on fleet tag, scope ssm:SendCommand to fleet-tagged instances and AWS-RunShellScript document, separate DescribeInstances (requires resource '*')

MichaelWalker-git · 2026-04-14T21:14:28Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

1. TOCTOU race in instance selection: after tagging an instance as busy, re-describe to verify our task-id stuck. If another orchestrator won the race, try the next idle candidate instead of double-dispatching. 2. Heartbeat false-positive: EC2/ECS tasks invoke run_task() directly and may not send continuous heartbeats. Suppress sessionUnhealthy checks when compute-level crash detection (pollSession) is active, preventing premature task failure after ~6 minutes. 3. SSM Cancelling status: map to 'running' (transient) instead of 'failed' to avoid premature failure while cancel propagates. 4. Fix babel parse errors in test mocks (remove `: unknown` annotations from jest.mock factory callbacks).

Copilot

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

1. Add rollback on verify failure: if DescribeInstances throws during the tag-then-verify claim, roll back the busy/task-id tags so the instance isn't stuck. 2. Use docker container prune instead of docker system prune in cleanup trap to preserve cached images and avoid re-pulling on next task. 3. Add ecr:BatchCheckLayerAvailability to instance role ECR permissions — required for docker pull from ECR. 4. InvocationDoesNotExist now rethrows instead of returning failed, letting the orchestrator's consecutiveComputePollFailures counter handle transient propagation delays (fails after 3 consecutive).

… container Two bugs prevented the EC2 compute strategy from working end-to-end: 1. Python sys.path used /app but the Docker image places modules at /app/src — fixed to sys.path.insert(0, "/app/src"). 2. GITHUB_TOKEN_SECRET_ARN was not passed to the Docker container, causing the agent to fail with "github_token is required" — now exported in the boot script and forwarded via docker run -e. Also enables the EC2 fleet construct in agent.ts with blueprints for krokoko/agent-plugins and aws-samples/sample-autonomous-cloud-coding-agents.

MichaelWalker-git · 2026-04-15T00:32:20Z

End-to-End EC2 Compute Strategy Test Results

Deployed the stack with EC2 fleet enabled and ran a live test task against aws-samples/sample-autonomous-cloud-coding-agents using compute_type: 'ec2'.

Bugs Found and Fixed

Python module path (/app -> /app/src): The Docker image copies src/ to /app/src/, so entrypoint.py lives at /app/src/entrypoint.py. The boot script's sys.path.insert was pointing to /app, causing ModuleNotFoundError: No module named 'entrypoint'.
Missing GITHUB_TOKEN_SECRET_ARN: The ECS strategy passes this env var to the container so the agent can fetch the GitHub PAT from Secrets Manager. The EC2 strategy was missing it, causing ValueError: github_token is required. Fixed by extracting the ARN from blueprintConfig and passing it via docker run -e.

Test Task: `01KP78GJ6P8G559VH2VGW5CGNJ`

Task:   "Add a CONTRIBUTORS.md file listing project contributors"
Repo:   aws-samples/sample-autonomous-cloud-coding-agents
Status: RUNNING (agent completed work, push failed due to PAT permissions on aws-samples org)

What worked end-to-end:

Task submitted via CLI with compute_type: 'ec2' blueprint
Orchestrator found idle EC2 instance, tagged it busy, dispatched SSM command
SSM boot script: S3 payload download, ECR login, docker pull all succeeded
Agent container started, resolved GitHub token from Secrets Manager
Agent cloned repo, explored git history, created CONTRIBUTORS.md, committed locally
Ran 11 turns, cost $0.16

What didn't work (not an infra issue):

git push and gh pr create failed with Permission denied to MichaelWalker-git — the configured PAT doesn't have write access to the aws-samples org. This is a GitHub permissions issue, not an EC2 strategy bug.

SSM Command Output (key lines)

[00:26:40] TASK Repository: aws-samples/sample-autonomous-cloud-coding-agents
[00:26:40] TASK Model: us.anthropic.claude-sonnet-4-6
[00:26:40] CMD clone: gh repo clone aws-samples/sample-autonomous-cloud-coding-agents /workspace/...
[00:26:41] CMD clone: OK
[00:26:46] CMD mise-install: OK
[00:27:13] TOOL Write: /workspace/.../CONTRIBUTORS.md
[00:27:26] TOOL Bash: git add CONTRIBUTORS.md && git commit -m "..."
[00:27:26] RESULT [ok] 1 file changed, 17 insertions(+)
[00:27:28] TOOL Bash: git push ... 
[00:27:28] RESULT [ERROR] Permission denied to MichaelWalker-git

Conclusion

The EC2 compute strategy works end-to-end. The two bugs in this commit were the only blockers. Instance lifecycle (idle -> busy -> idle tagging) and cleanup trap both function correctly.

krokoko · 2026-04-20T23:04:51Z

@theagenticguy could you please have a look, I know you have a good understanding of what the devbox should look like

krokoko · 2026-04-23T17:05:50Z

Output from automated review:

Good points:

Positives

Well-structured strategy pattern -- follows the existing ECS/AgentCore patterns consistently
Strong unit tests for ec2-strategy -- race condition handling, all SSM status mappings, error resilience in stopSession, handle type guards
Good CDK construct design -- proper cdk-nag suppressions, least-privilege SG (egress 443 only), S3 bucket with enforceSSL + blockPublicAccess + lifecycle
S3 payload pattern eliminates most shell injection -- task data (prompt, repo URL, etc.) correctly flows through S3 JSON rather than shell interpolation
Tag-then-verify with candidate loop is a reasonable approach for instance claiming with built-in retry

To address:

Shell injection via githubTokenSecretArn interpolated into bash script

File: cdk/src/handlers/shared/strategies/ec2-strategy.ts (boot script construction)

The value blueprintConfig.github_token_secret_arn is interpolated directly into single-quoted shell context:
export GITHUB_TOKEN_SECRET_ARN='${githubTokenSecretArn}',
This value is read from DynamoDB at runtime. If it ever contains a single quote (via a future admin API or direct DDB write), it breaks out of the quoting and enables arbitrary shell
execution on the EC2 instance. Other task data (prompt, repo URL) is correctly passed via S3 payload -- this field should follow the same pattern.

Fix: Pass githubTokenSecretArn through the S3 payload JSON instead of interpolating into shell, or validate the value to reject shell-unsafe characters.

Instance role ec2:CreateTags/ec2:DeleteTags allows cross-instance tag manipulation

File: cdk/src/constructs/ec2-agent-fleet.ts:174-182

The IAM policy grants ec2:CreateTags/ec2:DeleteTags with resources: ['*'] conditioned only on ec2:ResourceTag/bgagent:fleet. A compromised instance could:

Modify tags on any fleet instance (steal tasks, set others to idle/busy)
Add the bgagent:fleet tag to non-fleet resources and then manipulate those too
Remove the fleet tag from other instances to deny service

Fix: Add aws:TagKeys condition to restrict modifiable tag keys to only bgagent:status and bgagent:task-id, preventing modification of bgagent:fleet itself.

Race condition in instance claiming is not atomic

File: cdk/src/handlers/shared/strategies/ec2-strategy.ts (startSession)

The tag-then-verify pattern for claiming idle instances has a TOCTOU gap. Two concurrent orchestrators could both tag the same instance within the EC2 API propagation window, both
verify successfully, and both dispatch SSM commands to the same instance. The verify step helps but isn't fully atomic.

Fix: Consider using a DynamoDB conditional write as an atomic claim mechanism, or document this as a known limitation with mitigations (e.g., boot script checks for existing
containers before starting).

High Severity Issues

Empty catch block in stopSession tag cleanup -- zero logging

File: cdk/src/handlers/shared/strategies/ec2-strategy.ts (stopSession, final try/catch)

} catch {
// Swallow — instance may already be terminated
}
This discards all errors (IAM failures, throttling, network issues) with no logging whatsoever. If cleanup fails for any non-termination reason, the instance is permanently stuck as
"busy" with no diagnostic trail.

Fix: Log at warn level with the error message.

Rollback catch blocks log the event but omit the actual error

File: cdk/src/handlers/shared/strategies/ec2-strategy.ts (startSession, two rollback catches)

Both SSM dispatch rollback and verify-claim rollback log "Failed to rollback..." but don't include the error object, making root cause diagnosis impossible.

Fix: Capture and include error: rollbackErr instanceof Error ? rollbackErr.message : String(rollbackErr) in both log calls.

Unknown SSM statuses silently treated as "running" in pollSession

File: cdk/src/handlers/shared/strategies/ec2-strategy.ts (pollSession default case)

If AWS adds a new terminal SSM status, the system reports the command as "running" indefinitely -- the task polls for up to 8.5 hours before timing out with no indication of the
actual failure.

Fix: Add a logger.warn in the default case logging the unknown status value.

Boot script path mismatch: /app/src vs /app

File: cdk/src/handlers/shared/strategies/ec2-strategy.ts (bootScript)

The EC2 boot script sets sys.path.insert(0, "/app/src") but the ECS strategy uses sys.path.insert(0, "/app"). If the Docker image has entrypoint.py at /app/entrypoint.py (matching
ECS), the EC2 strategy will fail at runtime.

Fix: Verify the correct path in the agent Dockerfile and make both strategies consistent.

Medium Severity Issues

instance_type field threaded through config but never consumed

repo-config.ts and orchestrator.ts add instance_type to BlueprintConfig, but Ec2ComputeStrategy.startSession never reads it. The ASG uses a hardcoded m7g.xlarge default. This is dead
code that creates a misleading API contract.

memoryId prop accepted by Ec2AgentFleet but never used

The construct accepts memoryId?: string but never references it in the constructor body (unlike EcsAgentCluster which passes it as a container env var).

ec2FleetConfig typed as Record<string, never> -- confusing

In task-api.ts, this type carries no data -- it's just a truthy flag. Either use boolean for clarity or pass fleet identifiers for IAM scoping.

ssm:CancelCommand granted with resources: ['*'] without documentation

In task-api.ts, the cancel Lambda gets blanket SSM permissions with no comment explaining the API limitation (SSM doesn't support resource-level restrictions for CancelCommand). Add a
NagSuppression or inline comment.

Orchestrator NagSuppression reason not updated for new EC2/SSM wildcard policies

The AwsSolutions-IAM5 reason string in task-orchestrator.ts still only mentions ECS and iam:PassRole -- doesn't cover the new EC2, SSM, or S3 statements.

UserData boot failures silently reduce fleet capacity

If docker pull or ECR login fails, the instance never gets tagged as idle, sits in the ASG as "healthy" but unreachable for tasks. No error reporting mechanism exists.

Fix: Add a trap handler that tags the instance as bgagent:status=boot-failed on ERR

Test coverage gaps:

Test Coverage Gaps

┌──────────┬────────────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────┐
│ Priority │ Gap │ File to Update │
├──────────┼────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 9/10 │ No tests for cancel-task.ts EC2 branch -- 4 scenarios untested │ cdk/test/handlers/cancel-task.test.ts │
├──────────┼────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 8/10 │ No task-orchestrator construct tests for EC2 config -- env vars and IAM policies │ cdk/test/constructs/task-orchestrator.test.ts │
├──────────┼────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 7/10 │ S3 bucket security properties (enforceSSL, blockPublicAccess, encryption) not asserted │ cdk/test/constructs/ec2-agent-fleet.test.ts │
├──────────┼────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 7/10 │ orchestrate-task.ts EC2 compute metadata mapping untested │ cdk/test/handlers/orchestrate-task.test.ts │
├──────────┼────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 6/10 │ S3 upload failure and verify-describe failure rollback paths │ ec2-strategy.test.ts │
├──────────┼────────────────────────────────────────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ 5/10 │ No assertion that security group has zero ingress rules │ ec2-agent-fleet.test.ts │
└──────────┴────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────┘

… → MEDIUM PR aws-samples#52 deploy-validation uncovered that the task-input-guardrail at HIGH strength blocks legitimate imperative-mood user input at LOW classifier confidence — both on submit-time task descriptions and on pr_iteration hydration of PR bodies containing ordinary developer documentation. Observed failures on the current deploy: $ bgagent submit --repo scoropeza/agent-plugins \\ --task "Make no changes, just inspect README.md and finish." Error: Task description was blocked by content policy. (VALIDATION_ERROR) $ bgagent submit --repo scoropeza/agent-plugins \\ --task "Please enumerate every plugin in this repo in extreme detail, one at a time." Error: Task description was blocked by content policy. (VALIDATION_ERROR) (Scenario 7-ext take 3, against PR aws-samples#32 hydration) Guardrail blocked: PR context blocked by content policy: CONTENT/PROMPT_ATTACK (LOW) ``inputStrength: HIGH`` blocks on LOW, MEDIUM, and HIGH confidence. The PROMPT_ATTACK classifier is stochastic at LOW confidence and flags ordinary imperative language (task descriptions written in direct, second-person voice) as well as ordinary PR bodies containing imperative documentation (Troubleshooting sections, build-and-test imperatives, etc.). The Bedrock Guardrails documentation recommends MEDIUM as the default for non-adversarial user input. Fix: ``inputStrength: HIGH → MEDIUM`` — blocks only MEDIUM and HIGH confidence detections, ignores LOW. Preserves the prompt-injection defense against the clear-case patterns the classifier is confident about, and unblocks the long tail of natural-language user input that motivated the feature. Handler-side behavior is unchanged: the screening paths in ``shared/create-task-core.ts`` (submit) and ``shared/context- hydration.ts`` (pr_iteration hydration) still fail-closed on a ``GUARDRAIL_INTERVENED`` result. Only the trigger threshold moves. CDK suite: 996 passing (unchanged; no stack-level test pinned the previous threshold). Refs: PR aws-samples#52 Scenario 7-extended deploy-validation, Subagent B investigation (guardrail over-triggering on pr_iteration).

krokoko · 2026-05-11T18:33:05Z

Closing for now as there are pending, will reopen when working on this feature

…s#79 test gap aws-samples#32) Pre-PR-aws-samples#79 the new ``taskStrandedMessage`` and ``agentErrorMessage`` helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged this as a 7/10 gap because the renderers carry the prior_status / error_type / message_preview metadata threaded through from the event source — silent drift in the metadata field names would produce ugly fallback messages in production. Adds 5 tests: - task_stranded WITH metadata renders the prior_status parenthetical (``Task stranded for org/repo (last status: RUNNING)``) so operators can tell at a glance whether the task hung in HYDRATING vs RUNNING — without the parenthetical the reviewer's "generic Event: ..." UX regression would resurface. - task_stranded WITHOUT metadata still renders cleanly (legacy events written before the reconciler started stamping metadata must not crash or leak ``undefined``). - agent_error with full metadata (error_type + message_preview) renders the rotating_light, type, and preview. - agent_error WITHOUT metadata stays sensible — no leaked ``undefined`` strings or empty ``_Type:_`` line. - agent_error truncates a 500-char message_preview to keep Slack channel UX readable.

#79) * feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) Move the Slack outbound delivery off its own DynamoDB Streams consumer onto FanOutConsumer as a per-channel dispatcher. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels (Email, Teams, etc.) without exceeding the documented DynamoDB Streams 2-reader-per-shard practical limit. The PR also addresses an adversarial code review on the original migration; the body below walks through each piece in the order it landed. ## (a) Migration - `cdk/src/handlers/slack-notify.ts` — rewritten as exported `dispatchSlackEvent(event, ddb)` plus a tagged `SlackApiError` class. The standalone `handler(event)` stream entrypoint is gone; the FanOutConsumer is now the only TaskEventsTable stream reader. Behaviour preserved bit-for-bit: channel_source==='slack' gate, terminal-event dedup via conditional UpdateItem on `channel_metadata.slack_notified_terminal`, threaded replies under the @mention or task_created message, emoji transitions (eyes -> hourglass -> ✅/❌/🚫/⏲), DM channel_id -> user_id rewrite, intermediate session+created message cleanup on terminal events. - `cdk/src/handlers/fanout-task-events.ts` — replaces the log-only `dispatchToSlack` stub with a wrapper that calls dispatchSlackEvent and routes errors via the new typed contract (see (b) below). Slack defaults gain task_created, session_started, task_timed_out so the router fans out the lifecycle events the old SlackNotifyFn handled; the dispatcher's channel_source gate keeps non-Slack tasks unaffected. - `cdk/src/constructs/fanout-consumer.ts` — adds a scoped `secretsmanager:GetSecretValue` grant on `bgagent/slack/*` so the fanout Lambda can fetch per-workspace bot tokens. Same scope the old SlackNotifyFn role held. - `cdk/src/constructs/slack-integration.ts` — deletes SlackNotifyFn, its DynamoEventSource, its IAM policy, and its NagSuppressions entry. Drops the now-unused StartingPosition / FilterCriteria / FilterRule / lambdaEventSources imports. After this lands, `aws lambda list-event-source-mappings` shows exactly one consumer of the TaskEventsTable stream (FanOutFn); verified on the dev stack with end-to-end @mention + cancel + CLI isolation scenarios. ## (b) Review fix #1 — partial-batch retry semantics (BLOCKER) The first review pass found that the post-migration handler silently dropped Slack-side infra errors (DDB throttle on the GetItem, Secrets Manager 5xx, transient Slack timeout). Pre-migration the SlackNotifyFn handler rethrew non-SlackApiError so Lambda retried the batch; post-migration `Promise.allSettled` swallowed the rejection and routeEvent returned an empty list with no escalation path to `batchItemFailures`. routeEvent's return type changed from `NotificationChannel[]` to `{ dispatched, infraRejections }`. The handler now pushes the record into `batchItemFailures` whenever `infraRejections.length>0`, so Lambda replays the record under the partial-batch contract. The warn line on rejection is tagged `retryable: true` so operators can alert distinctly from the channel-terminal swallow path. GitHub got the symmetric treatment: 4xx (excluding the existing 401 and 404 handling) is now treated as a channel-terminal swallow via `fanout.github.api_error` instead of escalating to retry. ## (c) Review fix #2 — split SlackApiError into terminal + retryable Originally any `!result.ok` Slack response was wrapped in SlackApiError and swallowed. That collapsed retryable codes (`ratelimited`, `service_unavailable`, `internal_error`, `fatal_error`, `request_timeout`) into the same swallow as `channel_not_found` — a tier-1 Slack outage would silently drop every message. Introduced `TERMINAL_SLACK_API_ERRORS` set + `classifySlackError` helper. Terminal codes still throw SlackApiError (router swallows). Retryable codes throw a plain Error so the router classifies them as infra rejections and Lambda replays. ## (d) Review fix #3 — NOTIFIABLE_EVENTS / CHANNEL_DEFAULTS drift The original migration added task_created/session_started/task_timed_out to CHANNEL_DEFAULTS.slack but the dispatcher's NOTIFIABLE_EVENTS gate already excluded several events the router was subscribing Slack to (agent_error, pr_created, task_stranded). Result: Slack was reported as `dispatched` for events it silently dropped — telemetry lied, agent_error never reached operators on Slack-origin tasks, and task_stranded rendered the generic "Event: task_stranded for owner/repo" fallback (UX regression). Added render cases for task_stranded and agent_error in slack-blocks.ts and added them to NOTIFIABLE_EVENTS. Forward-compat approval_required and status_response stay out of NOTIFIABLE_EVENTS until their emitters ship; a new cross-file consistency test in fanout-task-events.test.ts fails if anyone re-introduces the drift. The Slack dispatcher wrapper now passes `effectiveEventType` so an agent_milestone(pr_created) wrapper is unwrapped before NOTIFIABLE_EVENTS matching. Without the rewrite, the dispatcher would short-circuit on the wrapper string `agent_milestone`. ## (e) Review fix #4 — conditional UpdateItem on lifecycle persists Once the BLOCKER fix made batches retry, the original task_created and session_started UpdateItem calls became hazardous: a Slack POST that succeeded but whose follow-up UpdateItem failed transiently would, on retry, post a second root and overwrite slack_thread_ts — orphaning every threaded reply that had threaded under the first ts. Both UpdateItems now carry an `attribute_not_exists` ConditionExpression on the relevant `channel_metadata.slack_*_msg_ts`. On ConditionalCheckFailedException the handler logs at info, deletes the duplicate Slack message via `chat.delete`, and returns. Sibling retry wins the race; the duplicate is cleaned up. ## (f) Dev-stack regression: drop pr_created from Slack defaults Live verification surfaced a UX duplication: pr_created (subscribed in CHANNEL_DEFAULTS.slack as the original §6.2 design called for) and task_completed both rendered messages with View PR buttons, posted seconds apart. The original SlackNotifyFn had silently dropped pr_created (NOTIFIABLE_EVENTS gate), so users hadn't relied on it. Removed pr_created from CHANNEL_DEFAULTS.slack and from NOTIFIABLE_EVENTS, and removed the prCreatedMessage renderer. GitHub keeps pr_created (its edit-in-place comment surface genuinely benefits from the early checkpoint). ## Verification - mise //cdk:compile — clean - mise //cdk:test — 1183 / 1183 pass (8 net-new tests added for the review fixes: NOTIFIABLE_EVENTS drift guard, retryable Slack codes, GitHub 4xx swallow, infra rejection escalation, SlackApiError swallow, task_stranded render) - mise //cdk:eslint — clean - mise //cdk:synth — confirms exactly one Lambda::EventSourceMapping on TaskEventsTable, pointing at FanOutFn - Dev-stack scenarios — @mention happy path, Cancel button, CLI submit (channel_source=api -> zero Slack dispatches, GitHub edit-in-place still fires) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(fanout): retry GitHub 403/429 instead of swallowing as terminal (#79 review #1) PR #79 review found that the new 4xx terminal-swallow path treats HTTP 403 and 429 as channel-terminal — but on GitHub these are transient rate-limit responses (403 with "API rate limit exceeded", 429 "Too Many Requests"). Under a reconciliation wave that touches many tasks, an entire window of GitHub comment updates would be permanently dropped with only a warn log. Carve out 403 and 429 from the swallow guard so they propagate as infra rejections through ``Promise.allSettled``. The record lands in ``batchItemFailures`` and Lambda replays until the rate-limit window clears (or DLQs after ``retryAttempts``). Test coverage: parametrized over 403 + 429 with a GitHubCommentError mock at the helper boundary, asserting the record's eventID surfaces in ``batchItemFailures`` rather than being absorbed. * fix(fanout): guard Slack Secrets Manager grant on a prop (#79 review #2) Every other external-service grant in FanOutConsumer (taskTable, repoTable, githubTokenSecret) is gated by ``if (props.X)``, so a deployment that hasn't onboarded the corresponding service stays free of dangling IAM permissions. The original migration broke the pattern with an unconditional ``bgagent/slack/*`` Secrets Manager grant — dev stacks without Slack onboarding ended up holding read permission on a resource pattern they never use, with a misleading ``cdk-nag AwsSolutions-IAM5`` suppression reason. Adds an optional ``slackSecretArnPattern`` prop on ``FanOutConsumerProps``; the policy statement is only attached when the prop is set. ``cdk/src/stacks/agent.ts`` now computes the ``bgagent/slack/*`` ARN inline and passes it through, mirroring the other guarded props. ``ArnFormat`` and ``Stack`` imports moved out of fanout-consumer.ts since the construct no longer needs them. No changes to live behaviour — agent.ts always passes the prop, so the IAM policy still attaches in production. The dispatcher will log-and-fail-retry on a missing pattern (covered by review #3 fix). Test gap covering the construct itself ships in a follow-up commit (test gap #34). * fix(fanout): throw on missing TASK_TABLE_NAME env var (#79 review #3) Pre-fix: when ``TASK_TABLE_NAME`` was unset on a Slack-subscribed event, ``dispatchSlackEvent`` returned silently after a warn line. The router counted Slack as ``dispatched`` and a broken stack quietly dropped every Slack notification — operators only saw it in the warn-rate metric, with no rejected-channel signal. Post-fix: throw a plain Error so the rejection propagates as an infra rejection through ``Promise.allSettled``. The router pushes the record into ``batchItemFailures``, Lambda retries the batch, the ``fanout.dispatcher.rejected`` warn fires per record, and operators get a distinct alarm. Also bumps the existing log line from ``warn`` to ``error`` and attaches an ``error_id: FANOUT_SLACK_MISSING_TASK_TABLE`` so the deployment-bug case can be distinguished from per-record failures. Test: ``throws when TASK_TABLE_NAME env var is missing`` deletes the env var, asserts the throw, asserts no DDB call was attempted (env-var guard fires first). * fix(fanout): match SlackApiError by name as well as instanceof (#79 review #7) When a bundler ever duplicates the slack-notify module (rare with NodejsFunction tree-shaking but possible if dual-bundled), two distinct SlackApiError classes coexist and ``instanceof`` against one fails for instances of the other. The dispatcher would see a foreign-class SlackApiError, fall through to the rethrow branch, and the router would treat it as an infra rejection — flipping a channel-terminal swallow into infinite Lambda retries. Add an ``err.name === 'SlackApiError'`` fallback so the swallow branch fires either way. Mirrors the duck-typed ``GitHubCommentError`` check used elsewhere in the same handler. Test: synthesise a plain Error with name === 'SlackApiError' (NOT an instance of the mock's SlackApiError class) and assert batchItemFailures stays empty — proving the swallow path catches both shapes. * fix(fanout): extend TERMINAL_SLACK_API_ERRORS with permission codes (#79 review #8) Original set omitted documented Slack permission/scope failures. Codes outside the set fall to the retryable branch, so a misconfiguration like ``ekm_access_denied`` or ``missing_scope`` would burn 3 Lambda retries before DLQ on every event — even though the failure is fundamentally a configuration bug that no retry can clear. Adds: - Permission/scope: missing_scope, ekm_access_denied, team_access_not_granted, posting_to_general_channel_denied - Payload shape: invalid_arguments Reorganized the set into commented blocks (channel-shape, auth, permission/scope, payload-shape) so future additions go in the right bucket and the rationale stays visible. Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS set (21 codes) — every one must throw SlackApiError so the router swallows it. The existing retryable test.each remains intact and covers the negative-class case (codes outside the set throw a plain Error and escalate to retry). * fix(fanout): promote Slack reaction/delete network errors to error logs (#79 review #5) The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class. * fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (#79 review #6) The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap #31 will add an explicit failure-path assertion when it lands). * chore(fanout): tighten RouteOutcome arrays to ReadonlyArray (#79 review #9) ``RouteOutcome.dispatched`` and ``infraRejections`` were typed as plain ``NotificationChannel[]`` — which made ``readonly`` on the property prevent reassignment but still allow callers to mutate the underlying array via ``.push``, ``.splice``, or ``.sort``. Inconsistent with the ``ReadonlySet<string>`` used for ``CHANNEL_DEFAULTS`` in the same file. Tightening to ``ReadonlyArray<NotificationChannel>`` makes the contract honest: the router owns the arrays, callers read them. Test suite updated to use ``[...outcome.dispatched].sort()`` where it previously called ``.sort()`` directly — the explicit copy makes the intent clear and would have surfaced any silent test-side mutation. * refactor(fanout): make SlackDispatchEvent a type alias of FanOutEvent (#79 review #10) The two interfaces were structurally identical: same five fields, same readonly modifiers, same metadata shape. The decoupling was purely nominal and a silent-drift footgun — adding a field to ``FanOutEvent`` (e.g. when the router starts plumbing an ``approval_required`` ID through) would not flow into ``SlackDispatchEvent``, leaving the dispatcher unaware until a downstream test happened to fail. Replace with a one-line type alias: export type SlackDispatchEvent = FanOutEvent; The slack-notify module now type-imports ``FanOutEvent`` from fanout-task-events. ``import type`` is erased at compile time, so the runtime bundle still has the one-way dep (fanout-task-events → slack-notify) — no module-cycle hazard. Reviewer-suggested ``Pick<FanOutEvent, 'task_id' | …>`` was considered and rejected: the dispatcher uses every field of ``FanOutEvent``, so the Pick would just enumerate the same five fields with extra noise. A direct alias keeps the intent obvious and prevents drift identically. * fix(fanout): generalize Slack dedup to cover agent_error + log Retry-After (#79 review #4) PR #79 review #4 surfaced a sibling-channel-failure hazard: when GitHub or Email rate-limits, the record lands in ``batchItemFailures``. On the Lambda retry, every Slack-subscribed event for that record runs again. Terminal events were already guarded by ``slack_notified_terminal``; ``agent_error`` was not — operators would page twice on a single agent failure if a sibling channel happened to fail. Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the ``channel_metadata`` attribute that should guard the post: - 5 terminals share ``slack_notified_terminal`` (any first-arriving terminal claims the right; subsequent terminals dedup against it) - ``agent_error`` gets its own ``slack_dispatched_agent_error`` so a duplicate agent_error doesn't reuse the terminal slot - ``task_created`` / ``session_started`` map to ``null`` because they already use the per-event ``slack_*_msg_ts`` conditional persists from review #1 — the conditional already provides full idempotency (a separate marker would be redundant) Also surfaces Slack's ``Retry-After`` header on rate-limited responses through a dedicated ``fanout.slack.retryable_api_error`` warn so operators reading CloudWatch can see the recovery window instead of guessing from sustained warn rate. Tests: - logs Retry-After header on rate-limited Slack responses (new): asserts ``retry_after_seconds`` propagates from Slack's response header into the warn metadata - existing terminal-codes parametrized test untouched (terminal branch doesn't read headers) - existing retryable test gains a ``headers: { get: () => null }`` stub on the fetch mock so the headers.get call doesn't crash Reviewer suggested a per-channel dispatch bitmap as the alternative. Rejected as premature: the duplicate-GitHub-PATCH is harmless (idempotent), Email is still a stub, and the dedup map covers the specific agent_error pain identified above. A bitmap would add a new table + IAM grants + per-dispatch DDB cost for a hypothetical problem (Slack rate-limiting AND a sibling channel failure). * test(fanout): conditional UpdateItem race + dup-delete coverage (#79 test gap) Adds 4 tests covering the lifecycle-persist conditional path that review fix #1 introduced and review fix #6 hardened. Pre-PR-#79 the only ConditionalCheckFailed coverage was the terminal-dedup path; the new lifecycle-persist + dup-delete code lacked direct assertions and was flagged 9/10 criticality by the reviewer. - task_created persist ConditionalCheckFailed → posts duplicate then deletes it: pins the cleanup behaviour that prevents ghost task_created posts in the channel - session_started persist ConditionalCheckFailed → posts duplicate then deletes it: parallel coverage for the other lifecycle attribute (slack_session_msg_ts) - dup-delete failure emits fanout.slack.dup_delete_failed with error_id: pins the operator-alarm signal added in review fix #6; asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED error_id propagate - chat.delete returning message_not_found is treated as success (no dup_delete_failed): negative-class assertion. Prevents false-positive alarms when the race resolves cleanly (the duplicate was already deleted by a prior retry). The ghost / message_not_found tests use ``fetchMock.mockImplementation`` URL-routing rather than ``.mockResolvedValueOnce`` chains because ``updateReaction`` issues 2-3 reaction-API fetches between chat.postMessage and chat.delete; routing by URL keeps the test focused on the load-bearing chat.delete behaviour without coupling to reaction call order. * test(fanout): cover task_stranded + agent_error renderers (#79 test gap #32) Pre-PR-#79 the new ``taskStrandedMessage`` and ``agentErrorMessage`` helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged this as a 7/10 gap because the renderers carry the prior_status / error_type / message_preview metadata threaded through from the event source — silent drift in the metadata field names would produce ugly fallback messages in production. Adds 5 tests: - task_stranded WITH metadata renders the prior_status parenthetical (``Task stranded for org/repo (last status: RUNNING)``) so operators can tell at a glance whether the task hung in HYDRATING vs RUNNING — without the parenthetical the reviewer's "generic Event: ..." UX regression would resurface. - task_stranded WITHOUT metadata still renders cleanly (legacy events written before the reconciler started stamping metadata must not crash or leak ``undefined``). - agent_error with full metadata (error_type + message_preview) renders the rotating_light, type, and preview. - agent_error WITHOUT metadata stays sensible — no leaked ``undefined`` strings or empty ``_Type:_`` line. - agent_error truncates a 500-char message_preview to keep Slack channel UX readable. * test(fanout): cover agent_error dedup + dedup-slot isolation (#79 test gap #33) Pre-PR-#79 review-fix #4 there was no direct test for the ``slack_dispatched_agent_error`` dedup attribute or its interaction with the existing ``slack_notified_terminal`` slot. A future refactor that collapsed the two slots — or renamed one of them — would silently break the sibling-channel-failure-retry guarantee that fix #4 added. Adds 4 tests: - ``agent_error claims its own dedup attribute``: pins the UpdateExpression and ConditionExpression strings so a refactor that renames the attribute breaks loudly. - ``agent_error retry hits the dedup guard``: end-to-end scenario matching review #4 — task already has ``slack_dispatched_agent_error: true``, retry must short-circuit before chat.postMessage. Without the guard, a second rotating_light fires. - ``terminal dedup attribute is per-class``: a flaky task_completed-then-task_failed sequence dedups against the same ``slack_notified_terminal`` slot. Catches the regression where the orchestrator emits both terminal types and we'd otherwise post both ✅ and ❌. - ``agent_error and terminals use distinct dedup slots``: the important negative — having ``slack_dispatched_agent_error`` set must NOT shadow a subsequent ``task_completed``. Pins the slot separation so a future merge into a single slot can't silently drop terminals after an agent_error. * test(fanout): add construct-level tests for FanOutConsumer (#79 test gap #34) The construct shipped on issue #64 with no unit-level coverage of its IAM contract. The only synth-level signal lived inside ``slack-integration.test.ts`` ("0 EventSourceMapping") which proved the migration didn't regress the OTHER construct. Reviewer flagged this 6/10 — and the gap is what allowed review #2 (unconditional Slack secret grant) to slip through in the first place. Adds 6 tests: - ``attaches a single DynamoEventSource on the TaskEventsTable stream``: pins the architectural invariant — issue #64 was fundamentally about reaching exactly-one stream reader. Adding a second consumer must fail this test loudly. - ``creates a DLQ for the fanout Lambda``: pins retention period + presence; a DLQ-less deployment would silently drop poison-pill records past retryAttempts. - ``omits the bgagent/slack/* grant when slackSecretArnPattern is not provided``: the review #2 invariant. Iterates every IAM::Policy and asserts NONE of them grant secretsmanager:* on a bgagent/slack/* ARN. A regression that re-introduces the unconditional grant breaks this test. - ``attaches the bgagent/slack/* grant only when slackSecretArnPattern is provided``: the positive case. Pins the grant shape (action, effect, resource pattern). - ``passes TASK_TABLE_NAME env var when taskTable is provided``: review #3 dependency — the dispatcher throws on missing env. - ``omits TASK_TABLE_NAME env var when taskTable is not provided``: graceful degrade for dev stacks that haven't onboarded the TaskTable yet (matches the construct's documented contract). * test(fanout): cover task_stranded through terminal dedup (#79 test gap #35) The reconciler at handlers/reconcile-stranded-tasks.ts:170 emits BOTH ``task_stranded`` and ``task_failed`` for a heartbeat-expired task — one for the operator signal, one to drive the FAILED status transition. Pre-PR-#79 this pair had no test coverage; reviewer flagged this 8/10 because the visible failure mode (a paired "Task stranded" + "Task failed" double-page in Slack) would surface in production but be silent in CI. Adds 2 tests: - ``task_stranded posts and writes the terminal dedup marker on first arrival``: pins that task_stranded participates in the shared terminal slot and renders the warning message with metadata. Catches a regression that omits task_stranded from the dedup map entirely. - ``task_stranded after a sibling task_failed dedups``: the operational scenario — task_failed already claimed ``slack_notified_terminal``; the subsequent task_stranded must short-circuit before chat.postMessage. Without this guard, operators get the double-page the reviewer warned about. * fix(fanout): re-read TaskRecord before terminal cleanup to close orphan-message race Live observation during PR #79 review verification: the same Slack @mention happy path sometimes leaves the 🚀 task_created message in the thread (orphaned beside the ✅ task_completed) and sometimes deletes it cleanly. The race window: 1. ``task_created`` stream batch posts the rocket message and persists ``slack_created_msg_ts`` via the conditional UpdateItem introduced in PR #79 review fix #1. 2. ``task_completed`` stream batch fires ~30s later. Its initial GetItem races the prior UpdateItem and sees a stale ``channel_metadata`` WITHOUT ``slack_created_msg_ts``. 3. The terminal cleanup branch checks ``channelMeta.slack_created_msg_ts`` — undefined — silently skips the chat.delete. The rocket message stays in the thread. Add a fresh GetItem inside the TERMINAL_EVENTS cleanup branch, after the dedup UpdateItem has linearized our view of the table. Any prior ``slack_*_msg_ts`` writes are visible by then, so the cleanup fires correctly. On a re-read failure (DDB throttle / transient blip) we fall back to the dispatch-entry snapshot and emit ``fanout.slack.cleanup_reread_failed`` so operators can alarm on the rate. Pre-existing race (the unconditional UpdateItem in pre-PR-#79 was the same shape — wrote, GetItem on the next batch could miss it). PR #79 doesn't introduce it but doesn't fix it either; this commit does, since the live screenshot evidence appeared during review verification. Tests: - ``terminal cleanup re-reads TaskRecord``: scripts a stale dispatch-entry GetItem followed by a fresh re-read GetItem with ``slack_created_msg_ts`` present; asserts chat.delete fires against the freshly-read ts. - ``terminal cleanup falls back to dispatch-entry snapshot when re-read fails``: defense-in-depth — DDB throttle on the re-read must not break terminal delivery; cleanup uses the entry snapshot and emits the fallback warn. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>

MichaelWalker-git requested a review from a team as a code owner April 14, 2026 20:04

MichaelWalker-git requested a review from Copilot April 14, 2026 20:11

Copilot AI reviewed Apr 14, 2026

View reviewed changes

MichaelWalker-git requested a review from Copilot April 14, 2026 20:43

Copilot started reviewing on behalf of MichaelWalker-git April 14, 2026 20:43 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread cdk/src/handlers/shared/strategies/ec2-strategy.ts Outdated

Comment thread cdk/src/handlers/shared/strategies/ec2-strategy.ts

Comment thread cdk/src/constructs/task-orchestrator.ts

MichaelWalker-git requested a review from Copilot April 14, 2026 20:56

Copilot started reviewing on behalf of MichaelWalker-git April 14, 2026 20:57 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread cdk/src/constructs/task-orchestrator.ts Outdated

MichaelWalker-git requested a review from Copilot April 14, 2026 21:25

Copilot started reviewing on behalf of MichaelWalker-git April 14, 2026 21:26 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread cdk/src/handlers/shared/strategies/ec2-strategy.ts Outdated

Comment thread cdk/src/handlers/shared/strategies/ec2-strategy.ts

Comment thread cdk/src/handlers/shared/strategies/ec2-strategy.ts Outdated

Comment thread cdk/src/constructs/ec2-agent-fleet.ts

MichaelWalker-git added 3 commits April 14, 2026 14:39

fix: use single quotes for non-interpolated template literals (lint)

b6caa02

krokoko and others added 2 commits April 16, 2026 13:42

Merge branch 'main' into feat/ec2-fleet-strategy

7860f86

Merge branch 'main' into feat/ec2-fleet-strategy

e2db60d

MichaelWalker-git enabled auto-merge April 20, 2026 19:27

Merge branch 'main' into feat/ec2-fleet-strategy

565100f

Merge branch 'main' into feat/ec2-fleet-strategy

a0281b3

Merge branch 'main' into feat/ec2-fleet-strategy

dfdaa10

scoropeza mentioned this pull request May 4, 2026

feat(interactive-agents): async-only background task UX + Cedar HITL design #52

Merged

5 tasks

scoropeza mentioned this pull request May 5, 2026

Bedrock Guardrail PROMPT_ATTACK filter over-triggers on ordinary imperative-mood prompts #56

Open

krokoko closed this May 11, 2026

auto-merge was automatically disabled May 11, 2026 18:33
Pull request was closed

isadeks mentioned this pull request May 13, 2026

feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) #79

Merged

8 tasks

Conversation

MichaelWalker-git commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

MichaelWalker-git commented Apr 14, 2026

Code review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelWalker-git commented Apr 15, 2026

End-to-End EC2 Compute Strategy Test Results

Bugs Found and Fixed

Test Task: 01KP78GJ6P8G559VH2VGW5CGNJ

SSM Command Output (key lines)

Conclusion

Uh oh!

krokoko commented Apr 20, 2026

Uh oh!

krokoko commented Apr 23, 2026

Uh oh!

krokoko commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MichaelWalker-git commented Apr 14, 2026 •

edited

Loading

Test Task: `01KP78GJ6P8G559VH2VGW5CGNJ`